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ABSTRACT 

A Volterra series approach has been applied to the identification of 
nonlinear systems which are described by a neural network model. 
A procedure is outlined by which a mathematical model can be 
developed from experimental data obtained from the network 
structure. Applications of the results to control of robotic systems 
are discussed. 


INTRODUCTION 

The Volterra series^) approach to the identification of nonlinear 
systems which is presented in this paper, is a natural extension of 
earlier results in modeling of linear systems^ 2 ). In reference 2, the 
impulse response of a linear dynamical system is shown to be given 
by the weights of the network model. For nonlinear systems, in 
addition to the impulse response of the linear approximation, the 
higher order Volterra kernels must be expressed in terms of the 
parameters of the trained network model. This relationship is 
reported in this paper. The result obtained means that it is possible 
not only to obtain a neural network model of the nonlinear dynamics, 
but also to represent this model by a mathematical expression. This 
opens a broad range of applications for the neural network modeling 
of nonlinear dynamical systems. The Volterra series in neural 
networks literature appeared recently in references 3 and 4. Both 
papers showed that a model of the Volterra system can model a 
nonlinear analytic system. However, this result follows directly from 
the representation theorem, proved in reference 5. Some interesting 
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results of neural network applications to nonlinear systems control 
can be found in reference 6. 

In robotics, there are many places where nonlinear processes exist. 
The nonlinearities to be controlled include motor dynamics, flexible 
beam vibrations, harmonic drive stiffness, gear backlash, and full 
arm dynamics. Some of these nonlinearities, for example, beam 
vibrations and full arm dynamics, can be classified as analytic 
nonlinearities. This paper shows how to obtain a mathematical 
model of these nonlinearities using experimental data collected from 
the system under investigation. 


In manipulator control, it is required that the manipulator respond 
quickly and accurately in spite of existing nonlinearities and inter- 
joint couplings. To obtain a good design, one should use as much a 
priori knowledge as possible and compliment the design with an 
adaptive fine tuning algorithm. In principle, this is the structure of 
the control scheme proposed by Koivo( ? ). In this structure, shown in 
figure 1, the primary controller is developed based on the available 
model of the manipulator and the secondary controller compensates 
for unmodeled dynamics. Investigating the design of the primary 
controller is proposed, using a nonlinear model of the manipulator to 
be obtained as a Volterra series representation of a neural network 
model. The system fine tuning can be done, if necessary, by an 
adaptive loop using a Linear Quadratic Gaussian approach. In the 
proposed design, the model -based approach^ 8 ) and the performance- 
based approach^ 9 ) would be merged to obtain better performance. 



Figure 1.- Manipulator System 
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MODELING OF LINEAR SYSTEMS 


The input-output relation of a system described by a linear 
differential equation may be given by a convolution integral 

OO 

y(t)= Jh(x)x(t-x)dx ( 1 ) 

0 

that specifies the output y(t) in terms of the input x(t) and the system 
impulse response h(x). The discrete-time representation has the 
following form: 

OO 

y(k)=X h( n )x( k -n) ( 2 ) 

n=0 

where the arguments k and n are shorthand for kT and nT with T 
being the sampling interval to be selected for any particular system. 
Information on the system bandwidth of interest is needed to choose a 
proper value for T. 

The relation (2) becomes approximate when a finite number of terms r 

is considered, that is, when 

r- 1 

y(k)=X h(n)x(k-n) (3) 

n=0 

which results in unmodeled dynamics, represented by the truncated 
terms. Equation (3), written using standard neural network notation, is 

y(k)=X w ln x(k-n) (4) 

n=° 

that is, the finite (truncated) impulse response is given by: 

1 h(0) h(l) h(r-l )l = [w 10 w n -w,^] (5) 

This relationship, at any time instant k, can be viewed as a 
representation of a neural network with r inputs x(k-i), i=0,l,—,r-l and 
a single output y(k), generated by a single linear neuron. This network 
can be considered a member of the E r class of feedforward 
networks( 5 ). 


Once r is fixed for a linear system, no modeling improvement can be 
reached by increasing the number of nodes and/or the number of 
layers. However, the increase of the number of nodes/layers will 
result in a structure redundancy and the robustness to neuron failure 
will be obtained. Consequently, the time needed to recover from a 
failure will be shorter. 
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The network model of a linear system, discussed so far, is shown in 
figure 2, in which q- 1 denotes the unit delay operator, that is, 

q 1 x(k)=x(k-l). 

This network is described by the following difference equation: 

(w 10 +w n q- 1 +wi 2 q- 2 +...+w ln q- n )x(k)=y(k) (6) 

which is equivalent to equation (4). Also, (6) can be represented as the 
vector product 


y(k)=oT(k)0(k) (7) 

with d> T (k)=[x(k) x(k-l) ... x(k-n)] and 0 T (kj=[w lo (k)w 1 j(k) ... w ln (k)]. It 
should be emphasized that by using the finite input sequence an 
approximate model (7) of the system (2), known as the Finite Impulse 
Response (FIR) model, is obtained. 


x(k) 



Figure 2.- Single node FIR network 
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MODELING OF NONLINEAR SYSTEMS 


ANALYTIC SYSTEMS 

Let a Single-Input, Single-Output (SISO) nonlinear dynamical system be 

described by a functional 

y(t)=F[x(t)] (8) 

where x(t) is the input, y(t) is the output and the functional F is 
analytic, such that it can be represented exactly by a converging 
infinite series of the following form: 

OO oo oo 

y(t)= Jh 1 (x)x(t-x)dx+ J Jh 2 (Ti,x 2 )x(t-xi)x(t-X 2 )dx 1 dx 2 +... 

0 0 0 
OO oo 

+ L Jh n (x 1 ,...,x n )x(t-xi)x(t-x 2 )...x(t-x n )dx ] dx 2 ...dx n +... (9) 

0 0 

Such a system can be represented to any desired degree of accuracy by 
a finite series of the form of (9). This equation, known as the Volterra 
series expansion, can be interpreted as a functional generalization of 
the Taylor series expansion and represents the solution to a large class 
of nonlinear differential equations. For a linear system only the first 
term in expression (9) is nonzero and represents the convolution 
integral, with hj(x) being the impulse response of the system. 

In expression (9), h n (xi,...,x n ), n=2,3,... are higher order Volterra 
kernels, or higher order impulse responses, introduced to describe 
nonlinear dynamic behavior. If (8) is discretized, then (9) assumes the 
form 

OO 

y(k)=Iy„(k) <i°) 

n= 1 

where 

OO OO 

y n ( k )= 2- X h n( n l> n 2 n„)x(k-n 1 )...x(k-n n ) (11) 

n | = 0 n n =0 


ACTIVATION FUNCTION SUITABLE FOR MODELING OF 
NONLINEAR DYNAMICAL SYSTEMS 

Let us assume that this equation (11) is to be modeled by a network. 
This implies the requirement that the number of inputs to the network 
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is finite. However, assume that the equation could be modeled by a X r 
network with the network input defined as follows: 

x T =[x(k) x(k-l) ... x(k-r+l)] (12) 

where r will be a number of time delayed inputs, and an activation 

function ¥(•) as an operator which is applied to the sum of the 
weighted inputs to a node to produce an output from the node. 

We have assumed so far that the modeling of nonlinear dynamical 
systems will require a multiple node L r network. 

Definition: The activation function for Z r networks, suitable for 
modeling nonlinear systems, is defined as a function ¥: R-»[a, b] which 
is differentiable, nondecreasing, lim H^X^b and lim'F(X)=a. 

a,->oo 

Two examples of an activation function, both widely used in E r 
networks, are the logistics function and the hyperbolic tangent 
function: 

1. Logistics function 'F: R-»[0, 1], defined by the following equation: 

H*(X)= 1 . with the derivative 'F , (k)=H / (^)(l-T'(k)) 

l+e - * 


2. Hyperbolic tangent function 4 / : R — > [- 1 , 1], defined by the following 
equation: 

*F(A.)=tanh(V)= /~ e ^ with the derivative 'F , (X)=1 - X F 2 (X) 
l+e* ZA 

Note that the computation of the derivatives of both functions, 
repeatedly performed while using a gradient method such as the error 
backpropagation for training, is computationally efficient. 

According to the V F(-) definition, the following functions are not the 
activation functions : 


1 . 


4 / (X)=sign^ 
^(X)= 


a for X<‘<\/a 
2. H' (A.)= < aX for a/a<_X<b/a 

L b for X>b/a 

with a<b and a being a positive constant. 
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Now, let us justify the definition of ¥(•) and the choice of one of those 
activation functions. First of all, let us realize that the activation 
function of a node should, for a given input x, encode within finite 
limits the amount of the preferred feature represented by this node 
and generated by this input. This preferred feature is abstracted in 
the learning process by the vector of weights w. The closeness of two 
vectors w and x is given in the vector space by their inner product 

w.x=XwiXi (13) 

i=l 

wx=llwll 1 1 X 1 1 cos0 (14) 

where 0 is the angle between the vectors w and x. By requiring the 
activation to be nondecreasing, the sense of closeness established by 
the inner product is preserved. Furthermore, since the derivative of 
the activation function is used in training the backpropogation 
network, the derivative must exist. Continuity is not sufficient. 

Also, note that for both the logistics function and the hyperbolic 
tangent function the requirements formulated so far hold. However, 

the linear approximation of tanh(-) in the neighborhood of zero is the 
straight line passing through zero. It was shown in an earlier paper(^) 
that with such an activation function the product of the weight 
matrices of a multiple layer feedforward network, trained to model 
linear dynamics, represents the impulse response of a simulated linear 
system. Futhermore, it is shown below that if the single hidden layer 
network, using the hyperbolic tangent as an activation function, is 
trained to represent the nonlinear system (10), then its weights form 

the kernel hj(-) of the functional y i (k) given by (11). The kernel hj(-) 
is interpreted as the impulse response of the linear approximation of a 
nonlinear system. It is important to have a simple relation between 

h](-) and the network weights. Consequently, the hyperbolic tangent is 
a better choice of an activation function. 


NETWORK REPRESENTATION OF SISO NONLINEAR SYSTEMS 

Using the hyperbolic tangent as the activation function, a nonlinear 
system can be modeled by a delay network generating the vector x, 
defined by (12), and followed by a single hidden layer network I r 0P) 
with q nodes. In other words, the model is defined as 
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where 


f: R r -»R: 



Wjtanh(netj), wjeR, 


j=1 


(15) 


r 

netj=bj+]^Wjjx(k-i+l) and bjg R 
i=l 

Therefore, the dynamics of a complete network model are defined by 
the weight matrix W=fwjj], the output weight vector w=[wj w 2 ... w q ] T , 
and the bias vector b=[bj b 2 ... b q ] T . Hornik et alS 5 ) proved that the E r 
network is capable of approximating any Borel measurable function 
from R r to R to any desired degree of accuracy, provided sufficiently 
many hidden nodes are available. If the system to be modeled has the 
input-output relation y=F(x), with the input vector x=[x(k) x(k-l) ... 
x(k-r+l)] T , such that at any sampling instant k it represents a nonlinear 
Borel measurable function from R r to R, the following claim follows 
directly from the representation theorem in reference 5. 


Claim: Under the assumptions made above and for a fixed r, the 
accuracy of the approximation of a system modeled increases with an 
increase of the number q of available hidden nodes. This accuracy can 
be improved to any desired degree by increasing both r and q. 


Note that under the assumptions made, the signal-dependent 
nonlinearities such as hysteresis are excluded. 


Example of Obtaining Volterra Kernals 

First, we shall partially analyze the network, showing the relations 
between the weights of a trained network and the Volterra kernels 
of the system modeled. Assume that the network is of the form 
shown in figure 2, with only one node (r=2) and no bias (b=0). 
Assuming that the nonlinear system (8) is analytic, then the network 
output y*(k) is given by: 

y*(k)=tanh(wi !x(k)+wi 2 x(k-l)) (16) 

Expanding y*(k) into a Volterra series, one obtains 
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y*(k)=W] jx(k)+Wj2x(k-l)- ^~w 3 jX 3 (k)-w 2 jW] 2 x 2 (k)x(k-l) 
- w ll w 1 2 2x( k >x 2 ( k - 1 )- |-w 1 3 2 x 3 (k-l) 

+(fifth-order terms)-(seventh-order terms)-*-... (17) 

If a nonzero bias is assumed, then the even-order terms will appear in 
the expansion (17). It was assumed that the system output y(k) can be 
represented by a Volterra series, that is, equation (10) holds. 

Assuming as above r=2 and b=0, 

1 l 1 

y (k)= £h ,(n)x(k-n)+ £ £h 2 (n,m)x(k-n)x(k-m) 

n=0 n=0 m=0 

1 1 1 

+ Y Y yh,(n,m,l)x(k-n)x(k-m)x(k-l)+... (18) 

n=0 m=() 1=0 

which, showing separately first-, second- and third-order terms, can be 
rewritten as 

y(k)=h l (0)x(k)+h,(l)x(k-l) 

+h 2 (0,0)x 2 (k)+lh 2 (0,l)+h 2 (l,0)]x(k)x(k-l)+h 2 (l,l)x 2 (k-l) 

+h 3 (0,0,0)x 3 (k) 

+ [h 3 (l ,0,0)+h 3 (0,l ,0)+h 3 (0,0,l)lx 2 (k)x(k-l) 
+[h 3 (l,l,0)+h 3 (l,0,l)+h 3 (0,l,l)]x(k)x 2 (k-l) 
+h 3 (l,l,l)x 3 (k-l)+... 

The coefficients in this expression are to be equated to the sum of 
proper coefficients expressed in terms of network weights in each of q 
expressions of the type (17). From this analysis, that is, assuming q— 1, 
one can find that 

h 1 (0)=w, , 
h 1 (l)=w 12 


In other words, one can obtain Volterra kernels from the trained 
network. Some of those kernels will not be uniquely defined. Instead, 
their sum, e.g., h 3 (l ,0,0)+h 3 (0,l ,0)+h 3 (0,0,l) will be equal to a constant 

uniquely defined by the network parameters. In such a case, this 
constant can be arbitrarily distributed among the components of this 
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sum. For any distribution, the model obtained will have the same 
properties. 

Using the network (15) for modeling, the output y(k) of a nonlinear 
system is, in general, approximated by 

y*(k)=2^Wjtanh(netj) (19) 

j=l 

If the function tanh(-) is replaced by its Taylor series, then 

q 

y*(k)= y y Wj[netj-^(netj) 3 +-j^(netj)5-...] (20) 

j=i 

r 

with netj=bj+ ^Wjjx(k-i+l). In equation (20), the expression in 

brackets corresponds to the expression given by (17) for the specific 
example discussed above. On the other hand, according to (10), 
r-1 r - 1 r - 1 

y(k)=X h i( n ) x ( k ' n ) + X £h 2 (n,m)x(k-n)x(k-m) 
n=0 n=0 m=0 

r-1 r-T r-1 

+ Z X ^h3(n,m,l)x(k-n)x(k-m)x(k-I)+... (21) 

n=0 m=0 |=0 

The coefficients in this expression are then equated to the coefficients 
in the expression for y*(k) given by equation (20). As a result one 
obtains the following equations specifying the first three Volterra 
kernels in terms of the parameters of the network model: 


h.OO-iwjW, n+1 (l-tanh 2 (bj)), 
j=l 

n=0,...,r-l (22) 

h 2 (n,m)=^WjWj n + 1 Wj m+1 (-2tanh(bj)+2tanh 3 (bj))/2!, 
j=l 

n=0,...,r-l, m=0,...,r-l (23) 

h 3 (n,m,l)=^WjWj n+1 Wj m+1 Wj l+1 (-2+8tanh 2 (bj)-6tanh 4 (bj))/3!, 
j =1 

n=0,...,r-l, m=0,...,r-l, l=0,...,r-l (24) 


If necessary, the equations specifying the higher order Volterra 
kernels can be obtained. 


Concluding Remarks 


This paper has demonstrated a method of determining the Volterra 
series representation of an analytic nonlinear dynamical system from a 
neural network that was trained on the nonlinear system to be 
identified. This procedure can be used to obtain a Volterra kernel 
with respect to a Taylor expansion of an arbitrary order. A simple 
example was demonstrated and the equations for the first three 
Volterra kernels were presented. 

Current work is focused on a formal derivation of the general 
equations, in terms of network parameters, for any Volterra kernel. 
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